week 8: multilevel models

MELSM

double-advanced potions multilevel models

  • Multivariate MLMs (M-MLM)

    • Fit models with 2+ (don’t get crazy) outcomes.
    • The advantage is estimating the correlations between the outcomes, as well as the Level 2 coefficients.
  • Mixed Effects Location Scale Models (MELSM)

    • Estimate both the mean (location) and variance (scale) of the repeated measures for each of your clusters.

We’ll use data provided by Williams and colleagues (2021.

d <- read_csv("https://raw.githubusercontent.com/sjweston/uobayes/refs/heads/main/files/data/external_data/williams.csv")
# scaled time variable
d <- d %>% mutate(day01 = (day - 2) / max((day - 2)))
distinct(d, record_id) %>% count()
# A tibble: 1 × 1
      n
  <int>
1   193
d %>% 
  count(record_id) %>% 
  summarise(median = median(n),
            min = min(n),
            max = max(n))
# A tibble: 1 × 3
  median   min   max
   <int> <int> <int>
1     74     8    99
Code
d %>% 
  count(record_id) %>% 
  ggplot(aes(x = n)) +
  geom_histogram(fill = "#1c5253", color = "white") +
  scale_x_continuous("number of days", limits = c(0, NA)) 
rethinking::precis(d) 
                   mean           sd         5.5%        94.5%      histogram
...1      7950.07235479 4992.5487783 931.76000000 1.651024e+04     ▇▇▇▇▇▅▅▃▂▁
P_A.std      0.01438236    1.0241384  -1.68129708 1.751466e+00       ▁▁▃▇▇▃▁▁
day         44.17962096   27.6985612   6.00000000 9.200000e+01     ▇▇▇▇▅▅▅▃▃▃
P_A.lag      0.01992587    1.0183833  -1.72043510 1.717036e+00     ▁▂▃▅▇▇▅▃▂▁
N_A.lag     -0.04575229    0.9833161  -0.87507181 1.990468e+00 ▇▃▂▁▁▁▁▁▁▁▁▁▁▁
steps.pm     0.05424387    0.6298941  -1.02580678 1.011356e+00       ▁▂▇▇▃▁▁▁
steps.pmd    0.02839160    0.7575395  -1.12359512 1.229974e+00 ▁▁▇▇▁▁▁▁▁▁▁▁▁▁
record_id   98.61029694   63.7493291  10.00000000 2.070000e+02   ▇▇▇▃▅▅▅▃▃▃▂▂
N_A.std     -0.07545093    0.9495660  -1.04842926 1.811061e+00     ▁▁▁▇▂▁▁▁▁▁
day01        0.43040430    0.2826384   0.04081633 9.183673e-01     ▇▇▇▇▅▅▅▃▃▃
Code
set.seed(14)

d %>% 
  nest(data=-record_id) %>% 
  slice_sample(n = 16) %>% 
  unnest(data) %>% 
  ggplot(aes(x = day, y = N_A.lag)) +
  geom_line(color = "grey") +
  geom_point(color = "#1c5253", size = 1/2) +
  geom_line(aes(y = P_A.lag), color = "darkgrey") +
  geom_point(aes(y = P_A.lag), color = "#e07a5f", size = 1/2) +
  ylab("affect (standardized)") +
  facet_wrap(~ record_id)

Let’s fit an MLM with varying intercepts and slopes.

Likelihood function and linear model

\[\begin{align*} \text{NA}_i &\sim \text{Normal}(\mu_i, \sigma) \\ \mu_i &= \alpha_{\text{id}[i]} + \beta_{\text{id}[i]}\text{time}_i \end{align*}\]

Varying intercepts and slopes:

\[\begin{align*} \alpha_{\text{id}[i]} &= \alpha + u_{\alpha,\text{id}[i]} \\ \beta_{\text{id}[i]} &= \beta + u_{\beta,\text{id}[i]} \end{align*}\]

Random effects:

\[\begin{align*} \begin{bmatrix} u_{\alpha,\text{id}[i]} \\ u_{\beta,\text{id}[i]} \end{bmatrix} &\sim \text{MVNormal}\left(\begin{bmatrix} 0 \\ 0 \end{bmatrix}, \mathbf{S}\right) \\ \mathbf{S} &= \begin{pmatrix} \sigma_{\alpha} & 0 \\ 0 & \sigma_{\beta}\end{pmatrix}\mathbf{R}\begin{pmatrix} \sigma_{\alpha} & 0 \\ 0 & \sigma_{\beta}\end{pmatrix} \end{align*}\]

Priors: \[\begin{align*} \alpha &\sim \text{Normal}(0,0.2) \\ \beta &\sim \text{Normal}(0,0.5) \\ \sigma &\sim \text{Exponential}(1) \\ \sigma_{\alpha} &\sim \text{Exponential}(1) \\ \sigma_{\beta} &\sim \text{Exponential}(1) \\ \mathbf{R} &\sim \text{LKJcorr}(2) \end{align*}\]

m1 <-
  brm(data = d,
      family = gaussian,
      N_A.std ~ 1 + day01 + (1 + day01 | record_id),
      prior = c(prior(normal(0, 0.2), class = Intercept),
                prior(normal(0, 1), class = b),
                prior(exponential(1), class = sd),
                prior(exponential(1), class = sigma),
                prior(lkj(2), class = cor)),
      iter = 3000, warmup = 1000, chains = 4, cores = 4,
      seed = 14,
      file = here("files/models/m81.1"))

What if I wanted to model positive affect? Usually, I’d have to fit a second model.

But what if I told you I could estimate both positive and negative affect simultaneously?

Likelihood function

\[\begin{align*} \text{NA}_i &\sim \text{Normal}(\mu_{\text{NA},i}, \sigma_{\text{NA}}) \\ \mu_{\text{NA},i} &= \alpha_{\text{NA},\text{id}[i]} + \beta_{\text{NA},\text{id}[i]}\,\text{time}_i \\ \\ \text{PA}_i &\sim \text{Normal}(\mu_{\text{PA},i}, \sigma_{\text{PA}}) \\ \mu_{\text{PA},i} &= \alpha_{\text{PA},\text{id}[i]} + \beta_{\text{PA},\text{id}[i]}\,\text{time}_i \end{align*}\]

Varying intercepts and slopes:

\[\begin{align*} \alpha_{\text{NA},\text{id}[i]} &= \alpha_{\text{NA}} + u_{\alpha,\text{NA},\text{id}[i]} \\ \beta_{\text{NA},\text{id}[i]} &= \beta_{\text{NA}} + u_{\beta,\text{NA},\text{id}[i]} \\ \alpha_{\text{PA},\text{id}[i]} &= \alpha_{\text{PA}} + u_{\alpha,\text{PA},\text{id}[i]} \\ \beta_{\text{PA},\text{id}[i]} &= \beta_{\text{PA}} + u_{\beta,\text{PA},\text{id}[i]} \end{align*}\]

Random Effects: Multivariate Distribution

\[\begin{align*} \mathbf{u}_{\text{id}[i]} = \begin{bmatrix} u_{\alpha,\text{NA},\text{id}[i]} \\ u_{\beta,\text{NA},\text{id}[i]} \\ u_{\alpha,\text{PA},\text{id}[i]} \\ u_{\beta,\text{PA},\text{id}[i]} \end{bmatrix} &\sim \text{MVNormal}(\mathbf{0}, \boldsymbol{\Sigma}) \\ \boldsymbol{\Sigma} &= \mathbf{L} \mathbf{R} \mathbf{L} \quad \\ \text{with} \quad \mathbf{L} &= \text{diag}(\sigma_{\alpha,\text{NA}}, \sigma_{\beta,\text{NA}}, \sigma_{\alpha,\text{PA}}, \sigma_{\beta,\text{PA}}) \end{align*}\]

Priors

\[\begin{align*} \alpha_{\text{NA}}, \alpha_{\text{PA}} &\sim \text{Normal}(0, 0.2) \\ \beta_{\text{NA}}, \beta_{\text{PA}} &\sim \text{Normal}(0, 0.5) \\ \sigma_{\text{NA}}, \sigma_{\text{PA}} &\sim \text{Exponential}(1) \\ \sigma_{\alpha,\text{NA}}, \sigma_{\beta,\text{NA}}, \sigma_{\alpha,\text{PA}}, \sigma_{\beta,\text{PA}} &\sim \text{Exponential}(1) \\ \mathbf{R} &\sim \text{LKJcorr}(2) \end{align*}\]

bf_na = bf(N_A.std ~ 1 + day01 + N_A.lag + P_A.lag + 
             (1 + day01 + N_A.lag + P_A.lag | c | record_id))
bf_pa = bf(P_A.std ~ 1 + day01 + N_A.lag + P_A.lag + 
             (1 + day01 + N_A.lag + P_A.lag | c | record_id))
m2 <-
  brm(data = d,
      family = gaussian,
      bf_na + bf_pa + set_rescor(TRUE),
      prior = c(prior(normal(0, 0.2), class = Intercept),
                prior(normal(0, 1), class = b),
                prior(lkj(2), class = cor)),
      iter = 3000, warmup = 1000, chains = 4, cores = 4,
      seed = 14,
      file = here("files/models/m81.2"))

Let’s make this better!

\[\begin{align*} \text{NA}_{ij} &\sim \text{Normal}(\mu_{ij}, \sigma) \\ \mu_{ij} &= \beta_0 + \beta_1\text{time}_{ij} + \mu_{0i} + \mu_{1i}\text{time}_{ij}\\ \text{log}(\sigma_i) &= \eta_0 + \mu_{2i} ... \end{align*}\]

m3 <-
  brm(data = d,
      family = gaussian,
      bf(N_A.std ~ 1 + day01 + (1 + day01 |i| record_id),
         sigma ~ 1 + (1 |i| record_id)),
      prior = c(prior(normal(0, 0.2), class = Intercept),
                prior(normal(0, 1), class = b),
                prior(exponential(1), class = sd),
                prior(normal(0,1), class = Intercept, dpar=sigma),
                prior(exponential(1), class = sd, dpar=sigma),
                prior(lkj(2), class = cor)),
      iter = 3000, warmup = 1000, chains = 4, cores = 4,
      seed = 14,
      file = here("files/models/m81.2"))